The Measles Project

Comparing Private & Public Schools

Introduction and Data

Motivation and Context

The motivation for this project stemmed from the measles dataset from tidytuesday. It was interesting to see how the data set categorically characterized the data. Our team imported the data into R and used the categories as a guide to make our plots and questions. The data as reported in the Wall Street Journal presents a map of measles vaccination rates in schools across the United States, showing that some schools have dangerously low vaccination rates, which increases the risk of measles outbreaks. The data is based on information from the 2018-2019 school year and includes public, private, and charter schools. According to the map, many of the schools with the lowest vaccination rates are in states such as Idaho, Utah, Colorado, and Oregon. The article also notes that measles cases have been on the rise in recent years, and emphasizes the importance of vaccination in preventing the spread of the highly contagious virus. Through this narrative and viewing the raw data we were able to form our hypothesis and question.

Research question: In 2018, how does the vaccination rate for measles differ across regions in the US? How do school districts in regions with high vaccination rates vs. low vaccination rates differ (public v. private, enrollment statistics, type of geographical region - urban, rural)?

Hypothesis: It is expected that states with higher rates of urbanization will have higher vaccination rates for mmr. It is also expected that schools in these states will have higher vaccination rates, differing mainly on whether they are private or public (whereby private may have higher rates).

The Data:

Quantitative - vaccination rates

Categorical - private or public and county and state levels

Ethical Issues

Vaccination data is listed under HIPAA as protected health information and therefore cannot be shared without patient consent. In certain areas, vaccine data is not required for schools as well. The data may also be used to discriminate against certain areas unjustly. The US does not all benefit equally from our research. Some areas may also be more or less receptive to revealing their vaccination information so the study may not impact our target population equally.

Limitations

We saw negative values, but we don’t know what it means - perhaps these suggest that the schools did/do not report vaccination rates.

Some states do not require schools to report MMR vaccination data, thus resulting in a lack of information for a few states. This results in inconclusive data from those states, and they will not be considered for this research analysis.

Citing Sources

[1] “Urban Percentage of the Population for States, Historical.” Iowa State University | Iowa Community Indicators Program, Iowa State University, www.icip.iastate.edu/tables/population/urban-pct-states.

Methodology

visualizations and summary statistics

Rows: 66113 Columns: 16
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (6): state, year, name, type, city, county
dbl (8): index, enroll, mmr, overall, xmed, xper, lat, lng
lgl (2): district, xrel

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 66,113
Columns: 16
$ index    <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 10, 11, 12, 13, 14, 15, 15, 16…
$ state    <chr> "Arizona", "Arizona", "Arizona", "Arizona", "Arizona", "Arizo…
$ year     <chr> "2018-19", "2018-19", "2018-19", "2018-19", "2018-19", "2018-…
$ name     <chr> "A J Mitchell Elementary", "Academy Del Sol", "Academy Del So…
$ type     <chr> "Public", "Charter", "Charter", "Charter", "Charter", "Public…
$ city     <chr> "Nogales", "Tucson", "Tucson", "Phoenix", "Phoenix", "Phoenix…
$ county   <chr> "Santa Cruz", "Pima", "Pima", "Maricopa", "Maricopa", "Marico…
$ district <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ enroll   <dbl> 51, 22, 85, 60, 43, 36, 24, 22, 26, 78, 78, 35, 54, 54, 34, 5…
$ mmr      <dbl> 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 1…
$ overall  <dbl> -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -…
$ xrel     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ xmed     <dbl> NA, NA, NA, NA, 2.33, NA, NA, NA, NA, NA, NA, 2.86, NA, 7.41,…
$ xper     <dbl> NA, NA, NA, NA, 2.33, NA, 4.17, NA, NA, NA, NA, NA, NA, NA, N…
$ lat      <dbl> 31.34782, 32.22192, 32.13049, 33.48545, 33.49562, 33.43532, 3…
$ lng      <dbl> -110.9380, -110.8961, -111.1170, -112.1306, -112.2247, -112.1…
`summarise()` has grouped output by 'name', 'city', 'state', 'lat', 'lng'. You
can override using the `.groups` argument.
Assuming "lng" and "lat" are longitude and latitude, respectively

Figure 1. Each data point represents a school, and the color represents the mmr vaccination rate at that school (reported) - where a dark blue would represent low vaccination rates (if any) and dark red would represent almost 100% (if not 100%) vaccination rates. This visualization will be used to 1) understand variability of vaccination rates in schools across different states, and 2) help us create valid connections between states that have substantially different mmr vaccination rates.

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Warning in state == c("Illinois", "Massachusetts", "Pennsylvania", "Arkansas", :
longer object length is not a multiple of shorter object length

Figure 2. This plot represents the six states with the highest and lowest MMR vaccination rates from the total states sampled. The states with the lowest reported MMR vaccination rates is Arkansas, Washington, and Minnesota. The states with the highest reported MMR vaccination rates is Massachusetts, Illinois, and Pennsylvania.

# A tibble: 21 × 3
   state           max   min
   <chr>         <dbl> <dbl>
 1 Arizona       100    15.4
 2 Arkansas       96.1  17.2
 3 California     99     1  
 4 Colorado      100    16.2
 5 Connecticut   100    67.9
 6 Illinois      100    10.4
 7 Maine         100    38.5
 8 Massachusetts 100     3  
 9 Minnesota     100    20  
10 Missouri       99     1  
# … with 11 more rows
`summarise()` has grouped output by 'state'. You can override using the
`.groups` argument.
`summarise()` has grouped output by 'county', 'state'. You can override using
the `.groups` argument.
`summarise()` has grouped output by 'county', 'state'. You can override using
the `.groups` argument.
# A tibble: 682 × 3
# Groups:   county, state [662]
   county       state        mmr
   <chr>        <chr>      <dbl>
 1 Adams        Colorado     100
 2 Adams        Illinois     100
 3 Addison      Vermont      100
 4 Alameda      California    99
 5 Alamosa      Colorado     100
 6 Albany       New York     100
 7 Alexander    Illinois     100
 8 Allegany     New York     100
 9 Allen        Ohio         100
10 Androscoggin Maine        100
# … with 672 more rows

Figure 3.

Warning: Removed 7411 rows containing missing values (`geom_point()`).

Figure 4. No data was reported for BOCES, and non public schools. This visualization refutes the idea that enrollment numbers influences reported MMR vaccination rates, though it may show that high enrollment numbers for public schools and for other school types (not including BOCES, and non public schools) suggests a higher MMR vaccination rate reported.

Figure 5.

State highest vs. lowest (county x, county y)

\(H_0: \bar x_{public}-\bar x_{private}=0\) difference in mean vaccination rate is 0, there is no significant difference

\(H_A: \bar x_{public}-\bar x_{private}\ne0\) difference in mean vaccination rate is not 0, there is a significant difference

# A tibble: 2 × 2
  type        m
  <chr>   <dbl>
1 Private  62.4
2 Public   91.8
Warning: Please be cautious in reporting a p-value of 0. This result is an
approximation based on the number of `reps` chosen in the `generate()` step. See
`?get_p_value()` for more information.
# A tibble: 1 × 1
  p_value
    <dbl>
1       0

We find evidence to reject the null hypothesis (p < .05) and we find strong evidence for the alternative hypothesis that the vaccination rate in public schools is significantly higher than in private schools.

Models & Prediction

[1] 0.2799859

The interactive model has an adjusted r-squared of 0.2799859.

[1] 0.2078126

The additive model has an adjusted r-squared of 0.02078126.

The higher adjusted r squared for the interactive model, over the additive one, suggests that the interactive model has a better overall fit.

# A tibble: 1 × 1
  .pred
  <dbl>
1  92.9

Hypothesis Testing

For this hypothesis test, let:

\(\pi_{priv}\) = the true proportion of students enrolled in private elementary school with the MMR vaccination in the United States (for states with data collected)

\(\pi_{pub}\) = the true proportion of students enrolled in public elementary school with the MMR vaccination in the United States (for states with data collected)

Null Hypothesis

\(H_o\): \(\pi_{priv}\) = \(\pi_{pub}\) ; the true proportion of students enrolled in private elementary school with the MMR vaccination is not significantly different from the true proportion of students enrolled in public elementary school with the MMR vaccination in the United States (for states with data collected)

Alternative Hypothesis

\(H_a\): \(\pi_{priv}\)\(\pi_{pub}\) ; the true proportion of students enrolled in private elementary school with the MMR vaccination is significantly different from the true proportion of students enrolled in public elementary school with the MMR vaccination in the United States (for states with data collected)

Results

Figure 1 - This visualization answers the 1st research question by showing which states have a high percentage of mmr vaccination rates across different schools (such as Ohio), and states that have a low percentage (such as Florida). Though Ohio and Florida don’t show much variability across different schools, California does show a lot of variability. Overall, this shows a general map of how mmr vaccination rates differ across different regions in the US.

Figure 2 - To explore if there is a relationship between urban and rural status and MMR vaccination rate, first, the states with the highest (Massachusetts, Illinois, and Pennsylvania) and lowest (Arkansas, Washington, and Minnesota) mean MMR vaccination rates were plotted. On its own this does not display any relation of urban or rural status to percentage of MMR vaccination rates, so information, provided by Iowa State University, on the 2010 U.S. Decennial Cenus was referenced [1]. The percentage of the total population in urban areas per each state was found: Massachusetts (92%), Illinois (88.5%), Pennsylvania (78.7%), Arkansas (56.2%), Washington (84.1%), and Minnesota (73.3%). There appears to be a weak correlation between urban/rural status of each state and MMR vaccination rate, but to a slight degree there is a trend. For instance, the state with the most drastic difference in mean MMR vaccination rate is Arkansas (80.49%) when compared to states with the highest mean MMR vaccination rates (>95%). Relatedly, Arkansas also has a significantly lower percentage of the total population in urban areas. States like Massachusetts and Illinois hold comparably higher percentages of the total population in urban areas and hold the highest mean MMR vaccination rates, 97.04% & 97.39%, respectively. However, states like Washington weaken the correlation by having a high percentage of the total population in urban areas, but reporting a comparably low mean MMR vaccination rate (89.3%).

Figure 3 -

Figure 4 - The data reported for Charter, Kindergarten and Private schools shows that for schools with no new enrollments the vaccination rates reported differ significantly. For public schools we see a little more diversity, but no strong correlation either. Public schools with 0 new enrollments may have high or even no vaccination rates reported, but generally we see that a high enrollment rate rarely leads to low vaccination rates (this also applies to Charter, Kindergarten, and private schools). This visualization answers the question on whether or not enrollment numbers has an effect on vaccination rates, for which it was determined, enrollment rate is not a determining factor.

Figure 5 - The graph showing the average vaccination rates between private and public schools seems to show a higher average vaccination rate for public schools. This is possibly due to local government rules and control over public schools in which students may be mandated to get vaccinated. Whereas in private schools it may have been more up to choice and out of government control.